fix: WebRTC signaling state guards — ICE restart, renegotiation, and glare (WT-986)#1003
Draft
fix: WebRTC signaling state guards — ICE restart, renegotiation, and glare (WT-986)#1003
Conversation
Check RTCSignalingStateStable before creating offer and re-check after the async createOffer gap to prevent TOCTOU race. Remove outdated TODO.
…ext, add unawaited
…calDescription crash WT-986
When both peers enable video simultaneously, both fire onRenegotiationNeeded while in stable state, creating a race where each sends a local offer before receiving the other's offer. This leaves both PCs in have-local-offer when __onCallSignalingEventUpdating fires, causing setRemoteDescription to fail with "Called in wrong state: have-local-offer" and dropping the call. Fix: check signalingState before setRemoteDescription in the updating handler. If have-local-offer, rollback the pending local offer first (returning to stable), then proceed normally. libwebrtc will re-fire onRenegotiationNeeded once the connection returns to stable after the answer completes.
flutter_webrtc caches signalingState on the Dart side and updates it only when the onSignalingState callback fires, not when setLocalDescription completes. The pre-check could therefore read a stale 'stable' value even though the native PC was already in have-local-offer, causing setRemoteDescription to fail unhandled. Add a try-catch on setRemoteDescription that catches the 'have-local-offer' string error thrown by flutter_webrtc, rolls back the local offer, and retries. The optimistic pre-check is kept as a fast path for the case where the Dart-side cache is already up to date.
…otiation Two separate races during simultaneous video-enable from both peers (glare): 1. RenegotiationHandler: setLocalDescription(offer) fails with "wrong state" when a concurrent __onCallSignalingEventUpdating calls setRemoteDescription between the TOCTOU guard and setLocalDescription. Previously escalated to callErrorReporter (user-visible error). Now caught as a transient race and logged at WARNING only — libwebrtc keeps [[NegotiationNeeded]] set and will re-fire onRenegotiationNeeded once the PC returns to stable. 2. __onCallSignalingEventUpdating: UpdateRequest(answer) rejected by server with 448 "SDP type answer is incompatible with session status incall" during glare, because the server is still processing the phone's own pending offer. Previously propagated out of the inner try-catch and triggered _ResetStateEvent.completeCall, dropping the call. Now caught specifically as WebtritSignalingErrorException and logged as a non-fatal warning. The PC local offer/answer exchange is already complete, media continues to flow, and the pending [[NegotiationNeeded]] causes RenegotiationHandler to re-offer once the server settles to incall state.
…rong state During a glare race the local offer may be rolled back by __onCallSignalingEventUpdating before the accepted event arrives. If the PC is already in stable when the accepted answer jsep is processed, setRemoteDescription(answer) would throw an uncaught wrong-state exception and video would silently fail to establish. Added a signalingState pre-check: if type is answer but PC is not in have-local-offer, skip setRemoteDescription and log a warning. RenegotiationHandler will fire a fresh offer on the next onRenegotiationNeeded and the video exchange completes via that cycle. Also wrapped setRemoteDescription in on String catch to prevent any residual state races from propagating as unhandled exceptions.
…ess of server response In the glare+448 scenario the server sends a CallErrorEvent (code 448) instead of the expected `updated` event. Without this fix `updating: true` is never cleared by __onCallSignalingEventUpdated, causing the video UI to stay hidden for the duration of the call. Clearing the flag unconditionally at the end of __onCallSignalingEventUpdating is safe: __onCallSignalingEventUpdated setting `false` when already `false` is a no-op.
…re-fire After a glare-resolution rollback, libwebrtc fires onAddTrack for the incoming video track but does NOT re-fire onAddStream (onAddStream fires only once per unique stream ID). The BLoC was relying solely on onAddStream to update remoteStream, so the video overlay was never mounted and the renderer was never attached to the stream. Two complementary fixes: 1. remoteVideo now falls back to the logical `video` flag when the stream is absent or has no video tracks yet. This ensures RemoteVideoViewOverlay is mounted and the RTCVideoRenderer is attached to the stream — once attached, the renderer renders any video track that arrives natively, including tracks added after the initial onAddStream. 2. onAddTrack is now forwarded as a streamAdded event so the BLoC updates remoteStream when the video track arrives on a different stream object (covers cases where Janus creates a new stream for renegotiated video).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four related fixes for WebRTC `RTCPeerConnection` signaling state violations that caused call drops or silent media failures:
1. ICE restart guard (`__onPeerConnectionEventIceConnectionStateChanged`)
2. Renegotiation guard (`RenegotiationHandler` — extracted from `_handleRenegotiationNeeded`)
3. Glare condition fix (`__onCallSignalingEventUpdating`)
4. Post-glare race fixes (this commit)
Two residual races that prevented video from appearing after the glare rollback:
`RenegotiationHandler` TOCTOU race with `__onCallSignalingEventUpdating`: After glare rollback, `onRenegotiationNeeded` re-fires and `RenegotiationHandler` starts a new offer cycle. However, `__onCallSignalingEventUpdating` concurrently calls `setRemoteDescription`, which moves the PC out of stable between the TOCTOU guard and `setLocalDescription`. Previously this escalated to `callErrorReporter` (user-visible error + call drop). Now caught as a transient race and logged at WARNING only — libwebrtc keeps `[[NegotiationNeeded]]` set and will re-fire `onRenegotiationNeeded` once the PC returns to stable, letting `RenegotiationHandler` succeed on the next attempt.
Server 448 on the glare answer: When both phones simultaneously enable video, each phone's rolled-back offer may still be pending on the server. The server rejects the answer `UpdateRequest` with `448 "SDP type answer is incompatible with session status incall"` because it is still processing the other phone's pending offer. Previously propagated through `catch (e, s)` and called `_ResetStateEvent.completeCall`, dropping the call. Now caught specifically as `WebtritSignalingErrorException` and logged as non-fatal — the PC local exchange is already complete, media continues to flow, and `RenegotiationHandler` re-offers on the next `onRenegotiationNeeded` once the server settles back to incall state.
Root cause (glare)
Although Janus is used as a media server, the WebTrit Core signaling layer forwards `UpdateRequest` SDP between peers without serialising simultaneous renegotiations. This means glare (both peers sending offers at the same time) is possible and must be handled client-side.
Test plan